Reproducible Research: who, what, where, why, when & how

SESYNC Computational Summer Institute July 2015

install shiny package in RStudio Server

install.pacakges('shiny')

Overview

  • Motivation & context
  • Concepts and vocabulary
  • General principles
  • Survey landscape of tools
  • Dissemination example with RShiny

Who is reproducible research for?

Have you ever tried….

  • to reproduce someone else's data analysis before?
  • to reproduce your own work before?

Was it…

  • impossible
  • difficult
  • possible
  • easy
Don’t let this be you!

Who is reproducible research for?

  • you, now and in the future
  • collaborators
  • reviewers & editors
  • funding agencies

What is reproducible research?

What is open source software?

Are these synonyms?

  • share
  • publish
  • archive

Access, understanding, sharing

Peng et al. 2011. Reproducible Research in Computational Science. Science 334:1226-1227.

“The goal of reproducible research is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified.” - Max Kuhn, CRAN Task View: Reproducible Research



i.e.
Raw data + instructions

What to share

Archive

  • starting dataset
  • metadata
  • data cleaning steps
  • analysis scripts
  • source code
  • readme

Share maybe

  • raw data
  • processed/cleaned data
  • intermediate results

What NOT to share

  • confidential data
  • copyrighted material
  • pre-existing restrictive licenses
  • your passwords and private keys

How to choose an appropriate repository?

  • is there a domain specific repository?
  • what are the backup & replication policies?
  • is there a plan for long-term preservation?
  • can people find your materials?
  • is it citable? (does it provide DOIs)
  • is your purpose archival, sharing or publication?

Why reproducible research?

Why?

Increased visibility and citation

Piwowar & Vision (2013) “Data reuse and the open data citation advantage.” PeerJ, e175

Figure 1: Citation density for papers with and without publicly available microarray data, by year of study publication.

Better research

Wicherts et al (2011) “Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.” PLoS ONE 6(11): e26828

Figure 1. Distribution of reporting errors

More efficient, less redundant science

When to think about reproducibility?

When to think about reproducibility?

  • now
  • before you start a project
  • at publication

When to think about reproducibility?

  • now
  • before you start a project
  • at publication

Tools for reproducible research

File organization: a mighty weapon against chaos

A good project layout helps ensure the

  • Integrity of data
  • Portability of the project
  • Easier to pick the project back up after a break

Help find and use your files again

  • Machine readable
    deliberate use of delimiters, avoid spaces and punctuation, accented characters
  • Human readable
    contains info on content in some way
  • Default ordering
    put something numeric first, use ISO 8601 standard for dates YYYY-MM-DD, left pad numbers with zeros
  • File formats
    Use non-proprietary file formats such as .csv and .txt rather than Word, Excel, PDFs, images

Markdown in R and RShiny

plug R Open Sci

More references & resources

Python dashboards